Output Manager Bloat Removal - Implementation Plan

Project: ACM V8 SQL - Output Manager Optimization
Date Created: November 30, 2025
Branch: refactor/output-manager-bloat-removal
File Target: core/output_manager.py (5,520 lines → ~3,700-4,000 lines)
Objective: Remove CSV writing and chart generation code for SQL-only mode
Expected Reduction: ~1,500-1,800 lines (27-33% of file)
Estimated Effort: 16-23 hours (2-3 days with testing)


Executive Summary

The OutputManager has accumulated significant bloat from legacy CSV file operations and chart generation code. Since the system now operates exclusively in SQL-only mode, this refactoring will:


Pre-Implementation Checklist

Environment Setup

Risk Assessment


Phase 1: Remove Chart Generation Infrastructure

Priority: P0 (Highest)
Time Estimate: 3-4 hours
Lines to Remove: ~940 lines (17% of file)
Risk Level: Low (protected by sql_only_mode guard)

Task 1.1: Delete Main Chart Generation Method

Location: Lines 2729-3682 (953 lines)

What to Remove:

What to Keep:

def generate_default_charts(
    self,
    scores_df: pd.DataFrame,
    episodes_df: pd.DataFrame,
    cfg: Dict[str, Any],
    charts_dir: Union[str, Path],
    sensor_context: Optional[Dict[str, Any]] = None
) -> List[Path]:
    """Chart generation disabled in SQL-only mode."""
    if self.sql_only_mode:
        Console.info("[CHARTS] SQL-only mode: Skipping chart generation")
        return []
    
    Console.warn("[CHARTS] Chart generation is deprecated and disabled")
    return []

Implementation Steps:

  1. Replace entire method with stub (lines 2729-3682)
  2. Remove helper functions:
    • _can_render() - Chart precondition checker
    • _safe_save() - Chart save wrapper
  3. Remove chart logging: chart_log: List[Dict[str, Any]] = []
  4. Update method docstring to indicate deprecation

Task 1.2: Remove Matplotlib Imports

Location: Around line 2743

What to Remove:

try:
    import matplotlib.pyplot as plt  # type: ignore
    from matplotlib import dates as mdates  # type: ignore
except Exception as exc:
    Console.warn(f"[CHARTS] Matplotlib unavailable: {exc}")
    return []

Verification:

Task 1.3: Remove Commented-Out Chart Code

What to Remove:

Search Pattern:

grep -n "# sensor_hotspots.png\|# sensor_sparklines.png\|# sensor_timeseries_events.png" core/output_manager.py

Location: Lines ~150-160

Keep: SEVERITY_COLORS dict (used by SQL analytics)

Update: Remove chart-specific comments referring to PNG output

Validation Checklist - Phase 1:


Phase 2: Remove CSV Writing Infrastructure

Priority: P0 (Highest)
Time Estimate: 2-3 hours
Lines to Remove: ~80 lines (1.5% of file)
Risk Level: Low (SQL mode is default)

Task 2.1: Remove CSV Batch Writing Method

Location: Lines 2036-2065 (30 lines)

Method to Delete: batch_write_csvs()

Full Method Signature:

def batch_write_csvs(self, csv_data: Dict[Path, pd.DataFrame]) -> Dict[Path, Dict[str, Any]]:

Verification Before Removal:

grep -n "batch_write_csvs" core/output_manager.py
grep -r "batch_write_csvs" core/ --include="*.py"

Task 2.2: Remove Optimized CSV Writer Helper

Location: Lines 1123-1141 (19 lines)

Method to Delete: _write_csv_optimized()

Full Method Signature:

def _write_csv_optimized(self, df: pd.DataFrame, path: Path, **kwargs) -> None:

Task 2.3: Remove CSV Fields from OutputBatch

Location: Line 364

What to Change:

# BEFORE:
@dataclass
class OutputBatch:
    """Represents a batch of outputs to be written together."""
    csv_files: Dict[Path, pd.DataFrame] = field(default_factory=dict)  # DELETE THIS
    json_files: Dict[Path, Dict[str, Any]] = field(default_factory=dict)
    sql_operations: List[Tuple[str, pd.DataFrame, Dict[str, Any]]] = field(default_factory=list)
    # ...
# AFTER:
@dataclass
class OutputBatch:
    """Represents a batch of outputs to be written together."""
    json_files: Dict[Path, Dict[str, Any]] = field(default_factory=dict)
    sql_operations: List[Tuple[str, pd.DataFrame, Dict[str, Any]]] = field(default_factory=list)
    # ...

Task 2.4: Simplify Batch Flush Logic

Location: Lines 2086-2088 in flush() method

What to Remove:

# DELETE THESE LINES:
if self._current_batch.csv_files and not self.sql_only_mode:
    self.batch_write_csvs(self._current_batch.csv_files)
    self._current_batch.csv_files.clear()

Keep:

Task 2.5: Update write_dataframe Documentation

Location: Lines 1160-1203

Update Docstring:

def write_dataframe(self, 
                   df: pd.DataFrame, 
                   file_path: Path,
                   sql_table: Optional[str] = None,
                   sql_columns: Optional[Dict[str, str]] = None,
                   non_numeric_cols: Optional[set] = None,
                   add_created_at: bool = False,
                   allow_repair: bool = True,
                   **csv_kwargs) -> Dict[str, Any]:
    """
    Write DataFrame to SQL (file output disabled).
    
    Args:
        df: DataFrame to write
        file_path: Path used as cache key only (no file written)
        sql_table: SQL table name (required in SQL-only mode)
        sql_columns: Column mapping for SQL (df_col -> sql_col)
        non_numeric_cols: Columns to treat as non-numeric for SQL
        add_created_at: Whether to add CreatedAt timestamp column
        allow_repair: If False, block write when required fields missing
        **csv_kwargs: Ignored (legacy compatibility)
        
    Returns:
        Dictionary with write results and metadata
    """

Validation Checklist - Phase 2:


Phase 3: Remove CSV Data Loading

Priority: P1 (High)
Time Estimate: 2-3 hours
Lines to Remove: ~200 lines (3.6% of file)
Risk Level: Medium (verify SQL loading works)

Task 3.1: Remove CSV Reader Helper

Location: Lines 334-362 (29 lines)

Method to Delete: _read_csv_with_peek()

Signature:

def _read_csv_with_peek(path: Union[str, Path], ts_col_hint: Optional[str], 
                        engine: Optional[str] = None) -> Tuple[pd.DataFrame, str]:

Verification:

grep -n "_read_csv_with_peek" core/output_manager.py

Task 3.2: Simplify load_data Method

Location: Lines 619-808 (189 lines total)

Current Structure:

What to Keep:

def load_data(self, cfg: Dict[str, Any], start_utc: Optional[pd.Timestamp] = None, 
              end_utc: Optional[pd.Timestamp] = None, 
              equipment_name: Optional[str] = None, 
              sql_mode: bool = False):
    """
    Load training and scoring data from SQL historian.
    
    Args:
        cfg: Configuration dictionary
        start_utc: Start time for SQL window queries
        end_utc: End time for SQL window queries
        equipment_name: Equipment name for SQL historian queries
        sql_mode: Must be True (file mode deprecated)
    """
    # SQL mode: Load from historian SP
    if sql_mode:
        return self._load_data_from_sql(cfg, equipment_name, start_utc, end_utc, 
                                       is_coldstart=False)
    
    # OM-CSV-01: Prevent CSV reads in SQL-only mode
    if self.sql_only_mode or not sql_mode:
        raise ValueError(
            "[DATA] OutputManager is SQL-only mode. CSV file loading is deprecated. "
            "Use sql_mode=True with equipment_name and time windows."
        )

Lines to Delete:

  1. Lines 677-730: Cold-start mode CSV splitting
  2. Lines 732-736: Normal CSV file reads
  3. Lines 740-808: CSV validation, cadence check, resampling

Keep Intact:

Task 3.3: Update load_data Docstring

Old Documentation References to Remove:

New Documentation:

"""
Load training and scoring data from SQL historian.

Consolidates data loading into OutputManager for unified I/O pipeline.
File-based loading (CSV mode) has been deprecated - SQL-only operation required.

Args:
    cfg: Configuration dictionary with data.* settings
    start_utc: Start time for SQL window query (required for SQL mode)
    end_utc: End time for SQL window query (required for SQL mode)
    equipment_name: Equipment name for historian (e.g., 'FD_FAN', 'GAS_TURBINE')
    sql_mode: Must be True (legacy parameter for compatibility)

Returns:
    Tuple of (train_df, score_df, metadata)

Raises:
    ValueError: If sql_mode=False or SQL-only mode prevents CSV reads

Example:
    >>> om = OutputManager(sql_client=client, sql_only_mode=True)
    >>> train, score, meta = om.load_data(
    ...     cfg=config,
    ...     start_utc=pd.Timestamp('2025-01-01'),
    ...     end_utc=pd.Timestamp('2025-01-02'),
    ...     equipment_name='FD_FAN',
    ...     sql_mode=True
    ... )
"""

Validation Checklist - Phase 3:


Phase 4: Simplify Conditional Checks

Priority: P2 (Medium)
Time Estimate: 1 hour
Lines to Remove: ~24 lines (<1% of file)
Risk Level: Low (safe simplification)

Task 4.1: Remove Redundant sql_only_mode Checks

These methods have sql_only_mode guards that can be removed since we're removing the functionality entirely:

Remove Check #1: write_json()

Location: Lines 1507-1509

Before:

def write_json(self, data: Dict[str, Any], file_path: Path) -> None:
    """Write JSON data to file."""
    if self.sql_only_mode:
        Console.info("[OUTPUT] SQL-only mode: Skipping JSON file write")
        return
    file_path.parent.mkdir(parents=True, exist_ok=True)
    # ...

After:

def write_json(self, data: Dict[str, Any], file_path: Path) -> None:
    """Write JSON data to file (deprecated - use SQL tables for metadata)."""
    Console.warn("[OUTPUT] JSON file writes are deprecated. Use SQL tables for metadata.")
    return

Remove Check #2: write_jsonl()

Location: Lines 1520-1522

Similar simplification as write_json

Remove Check #3: Episode Severity JSON

Location: Lines 2021-2029 in write_episodes()

What to Remove:

# Episode severity mapping for dashboard (file only, no SQL table)
if not self.sql_only_mode:
    severity_file = run_dir / "episode_severity_mapping.json"
    severity_mapping = self._generate_episode_severity_mapping(episodes_df)
    try:
        self.write_json(severity_mapping, severity_file)
    except Exception as e:
        Console.warn(f"[OUTPUT] Failed to write episode severity mapping: {e}")

Remove Check #4: Batch JSON Flush

Location: Lines 2090-2092 in flush()

What to Remove:

if self._current_batch.json_files and not self.sql_only_mode:
    # Batch write JSON files
    # ... (implementation)

Remove Check #5: Schema Descriptor Write

Location: Lines 2708-2715

What to Remove:

if not self.sql_only_mode:
    schema_file = tables_dir / "table_schemas.json"
    schema_desc = self._generate_schema_descriptor(tables_dir)
    try:
        self.write_json(schema_desc, schema_file)
    except Exception as e:
        Console.warn(f"[OUTPUT] Failed to write schema descriptor: {e}")

Task 4.2: Keep Critical Guards

Guard #1: Chart Generation (KEEP) Location: Line 2738

def generate_default_charts(...) -> List[Path]:
    if self.sql_only_mode:
        Console.info("[CHARTS] SQL-only mode: Skipping chart generation")
        return []

Reason: Safety guard for any remaining callers

Guard #2: CSV Load Prevention (KEEP) Location: Line 670

if self.sql_only_mode and not sql_mode:
    raise ValueError("[DATA] OutputManager is in sql_only_mode...")

Reason: Critical safety check to prevent accidental CSV reads

Task 4.3: Simplify Constructor Default

Location: Line 397

Before:

def __init__(self, 
             sql_client=None, 
             run_id: Optional[str] = None,
             equip_id: Optional[int] = None,
             batch_size: int = 5000,
             enable_batching: bool = True,
             sql_health_cache_seconds: float = 60.0,
             max_io_workers: int = 8,
             base_output_dir: Optional[Union[str, Path]] = None,
             batch_flush_rows: int = 1000,
             batch_flush_seconds: float = 30.0,
             max_in_flight_futures: int = 50,
             sql_only_mode: bool = False):  # <-- Change this

After:

sql_only_mode: bool = True):  # <-- Default to True

Add Deprecation Warning:

if not sql_only_mode:
    Console.warn(
        "[OUTPUT] File-based mode is deprecated. "
        "sql_only_mode=False will be removed in future version. "
        "System will operate in SQL-only mode regardless."
    )
    sql_only_mode = True  # Force SQL-only

Validation Checklist - Phase 4:


Phase 5: Remove Helper Methods

Priority: P2 (Medium)
Time Estimate: 1-2 hours
Lines to Remove: ~90 lines (1.6% of file)
Risk Level: Low (unused utilities)

Task 5.1: Remove Schema Descriptor Generator

Location: Lines 3685-3730 (46 lines)

Method to Delete: _generate_schema_descriptor()

What It Does:

Why Remove:

Verification:

grep -rn "_generate_schema_descriptor" core/ --include="*.py"

Expected: Only definition and one call in generate_all_analytics_tables() around line 2710

What to Do:

  1. Delete method definition (lines 3685-3730)
  2. Remove call from generate_all_analytics_tables() (lines 2708-2715)

Task 5.2: Evaluate Episode Severity JSON Generator

Location: Lines 3732-3775 (44 lines)

Method: _generate_episode_severity_mapping()

Decision Tree:

Is it used by SQL analytics tables?
├── YES → Keep and simplify to SQL-only
└── NO → Delete completely

Check Usage:

grep -rn "_generate_episode_severity_mapping" core/ --include="*.py"

Current Usage:

Decision: DELETE - Only used for file output

Task 5.3: Clean Up File Path References

Search for CSV references:

grep -n "\.csv" core/output_manager.py | grep -v "# "

Common Patterns to Update:

  1. Comment Examples:

    # OLD: cache_key = file_path.name  # e.g., "scores.csv"
    # NEW: cache_key = file_path.name  # e.g., "scores_wide"
    
  2. Docstring Examples:

    # OLD: >>> scores = output_manager.get_cached_table("scores.csv")
    # NEW: >>> scores = output_manager.get_cached_table("scores_wide")
    
  3. Variable Names:

    # OLD: tables_dir / "pca_metrics.csv"
    # NEW: Keep for cache key compatibility
    

Strategy: Leave .csv in cache keys for backward compatibility, update only documentation

Task 5.4: Remove JSON File Batch Infrastructure

Location: Line 365 (OutputBatch dataclass)

Evaluate: json_files: Dict[Path, Dict[str, Any]]

Usage Check:

grep -n "json_files" core/output_manager.py

Decision:

Validation Checklist - Phase 5:


Phase 6: Update Documentation & Imports

Priority: P3 (Low)
Time Estimate: 1 hour
Lines Modified: ~50 lines
Risk Level: None (documentation only)

Task 6.1: Update Module Docstring

Location: Lines 1-13

Before:

"""
Unified Output Manager for ACM
==============================

Consolidates all scattered output generation into a single, efficient system:
- Batched file writes with intelligent buffering
- Smart SQL/file dual-write coordination with caching
- Single point of control for all CSV, JSON, and model outputs
- Performance optimizations: vectorized operations, reduced I/O
- Unified error handling and logging

This replaces scattered to_csv() calls throughout the codebase and provides
consistent behavior for all output operations.
"""

After:

"""
Unified Output Manager for ACM
==============================

Consolidates all output generation into a single, efficient SQL-based system:
- Batched SQL writes with intelligent buffering and connection pooling
- Smart artifact caching for in-memory data exchange between modules
- Single point of control for all SQL table writes and analytics generation
- Performance optimizations: vectorized operations, reduced I/O overhead
- Unified error handling and logging

Operates exclusively in SQL-only mode. File-based CSV/chart operations have
been deprecated for performance and operational simplicity.
"""

Task 6.2: Clean Up Imports

Check for Unused Imports:

pylint core/output_manager.py --disable=all --enable=unused-import

Expected Removals:

Verify Required Imports:

Task 6.3: Update Key Method Docstrings

Update #1: write_dataframe()

Location: Lines 1160-1180

Add to Docstring:

"""
Write DataFrame to SQL (file output deprecated).

**SQL-Only Mode:** This method only writes to SQL tables. The file_path
parameter is retained for cache key compatibility but no file is written.

**Breaking Change:** CSV file writing has been removed. Use SQL tables for
all persistent storage.

Args:
    df: DataFrame to write
    file_path: Path used as artifact cache key (no file written)
    sql_table: SQL table name (required)
    sql_columns: Optional column mapping (df_col -> sql_col)
    non_numeric_cols: Columns to preserve as non-numeric
    add_created_at: Add CreatedAt timestamp to SQL row
    allow_repair: Allow auto-repair of missing required fields
    
Returns:
    Dict with keys: sql_written (bool), rows (int), error (str or None)
    
Raises:
    ValueError: If sql_table not in ALLOWED_TABLES
    
Example:
    >>> result = om.write_dataframe(
    ...     df=scores,
    ...     file_path=Path("scores_wide"),  # Cache key only
    ...     sql_table="ACM_Scores_Wide",
    ...     sql_columns={"timestamp": "Timestamp", "fused": "fused"}
    ... )
    >>> print(result['sql_written'])  # True
"""

Update #2: load_data()

Already covered in Phase 3

Update #3: generate_default_charts()

Already covered in Phase 1

Update #4: Class-level Docstring

Location: Lines 375-385

Update:

class OutputManager:
    """
    Unified output manager for SQL-based analytics and model persistence.
    
    Features:
    - Batched SQL writes with connection pooling for high performance
    - Automatic schema validation and column mapping
    - Artifact caching for in-memory data exchange (FCST-15)
    - Thread-safe operations with backpressure control
    - Intelligent error handling with auto-repair capabilities
    - Comprehensive analytics table generation
    
    **Architecture:** SQL-only mode. All persistent storage uses SQL Server.
    File-based operations (CSV, charts) have been deprecated.
    
    **Usage:**
        >>> om = OutputManager(
        ...     sql_client=client,
        ...     run_id=run_id,
        ...     equip_id=equip_id,
        ...     sql_only_mode=True  # Default
        ... )
        >>> om.write_dataframe(df, Path("cache_key"), sql_table="ACM_Scores_Wide")
    """

Task 6.4: Update Type Hints

Check All Method Signatures:

Example Fix:

# Before:
def write_dataframe(self, ..., **csv_kwargs) -> Dict[str, Any]:

# After (keep for compatibility):
def write_dataframe(self, ..., **csv_kwargs) -> Dict[str, Any]:
    """... Note: csv_kwargs parameter is ignored (legacy compatibility)"""
    if csv_kwargs:
        Console.warn("[OUTPUT] csv_kwargs parameter is deprecated and ignored")

Task 6.5: Add Deprecation Notices

Create Deprecation Section:

# ==================== DEPRECATED METHODS ====================
# The following methods are stubs for backward compatibility.
# They will be removed in a future version.

def write_json(self, data: Dict[str, Any], file_path: Path) -> None:
    """DEPRECATED: Use SQL tables for metadata instead."""
    Console.warn("[OUTPUT] write_json is deprecated. Use SQL tables.")
    return

def write_jsonl(self, records: List[Dict[str, Any]], file_path: Path) -> None:
    """DEPRECATED: Use SQL tables for metadata instead."""
    Console.warn("[OUTPUT] write_jsonl is deprecated. Use SQL tables.")
    return

Validation Checklist - Phase 6:


Phase 7: Caller Migration

Priority: P1 (High)
Time Estimate: 2-3 hours
Lines Modified: External files
Risk Level: Medium (integration testing required)

Task 7.1: Update acm_main.py

Search for Usages:

grep -n "generate_default_charts\|batch_write_csvs\|write_csv\|write_json" core/acm_main.py

Expected Findings:

  1. Chart generation call (likely around line 4200-4300)
  2. OutputManager initialization

Changes Needed:

Change #1: Remove Chart Generation Call

# BEFORE:
if cfg.get("output", {}).get("enable_charts", True):
    charts_dir = run_dir / "charts"
    chart_files = output_manager.generate_default_charts(
        scores_df=frame,
        episodes_df=episodes,
        cfg=cfg,
        charts_dir=charts_dir,
        sensor_context=sensor_context
    )
    Console.info(f"[CHARTS] Generated {len(chart_files)} charts")

# AFTER:
# Chart generation deprecated - removed for SQL-only mode performance
pass

Change #2: Verify OutputManager Init

# Ensure sql_only_mode is enabled (now default)
output_manager = create_output_manager(
    sql_client=sql_client,
    run_id=run_id,
    equip_id=equip_id,
    sql_only_mode=True  # Explicit for clarity
)

Task 7.2: Update forecasting.py

Check Dependencies:

grep -n "output_manager\|generate_default_charts\|\.csv" core/forecasting.py

Expected Findings:

  1. OutputManager usage for artifact cache
  2. Possible chart calls
  3. CSV read/write operations

Changes Needed:

Verify Artifact Cache Usage (FCST-15)

# CORRECT: Using artifact cache
scores_df = output_manager.get_cached_table("scores_wide")
health_df = output_manager.get_cached_table("health_timeline")

# If this pattern is used:
if scores_df is None:
    # Fallback to SQL read
    scores_df = sql_client.read_table("ACM_Scores_Wide", equip_id, run_id)

Remove Any Chart Calls

# DELETE if found:
# chart_files = output_manager.generate_default_charts(...)

Task 7.3: Update enhanced_rul_estimator.py

Check Dependencies:

grep -n "output_manager\|write_dataframe\|\.csv" core/enhanced_rul_estimator.py

Expected Findings:

  1. RUL table writes via write_dataframe
  2. Artifact cache reads

Verify:

Task 7.4: Search All Core Modules

Comprehensive Search:

# Find all chart generation calls
grep -r "generate_default_charts" core/ --include="*.py"

# Find all CSV batch writes
grep -r "batch_write_csvs" core/ --include="*.py"

# Find all direct CSV writes
grep -r "\.to_csv\|_write_csv" core/ --include="*.py"

# Find all JSON file writes
grep -r "write_json\|write_jsonl" core/ --include="*.py"

Create Checklist:

Task 7.5: Update Configuration Files

Check: configs/config_table.csv

Look for:

Example Updates:

# Before:
*,output,enable_charts,true,boolean,Chart generation enabled
*,output,enable_csv,true,boolean,CSV file output enabled

# After:
*,output,enable_charts,false,boolean,Chart generation deprecated (SQL-only mode)
*,output,enable_csv,false,boolean,CSV output deprecated (SQL-only mode)

Validation Checklist - Phase 7:


Phase 8: Testing & Validation

Priority: P0 (Critical)
Time Estimate: 2-3 hours
Risk Level: None (validation phase)

Task 8.1: Unit Tests

Test File: tests/test_output_manager.py

Update Test Fixtures

# Remove CSV-related test fixtures
# Remove chart generation test cases
# Add SQL-only validation tests

Run Unit Tests

# Full test suite
pytest tests/test_output_manager.py -v

# Specific test categories
pytest tests/test_output_manager.py -k "sql" -v
pytest tests/test_output_manager.py -k "write" -v
pytest tests/test_output_manager.py -k "cache" -v

Expected Results:

Task 8.2: Integration Tests

Test #1: Full Pipeline Run

python -m core.acm_main --equip FD_FAN

Verify:

Test #2: Batch Processing

python scripts/sql_batch_runner.py --equip FD_FAN --max-batches 5

Verify:

Test #3: Coldstart Mode

python -m core.acm_main --equip TEST_EQUIP --force-coldstart

Verify:

Test #4: Forecasting Module

# After a successful run, verify forecasting works
python -c "
from core.forecasting import generate_forecasts
from core.output_manager import create_output_manager
om = create_output_manager(...)
scores = om.get_cached_table('scores_wide')
forecasts = generate_forecasts(scores, ...)
print('Forecasting works:', len(forecasts) > 0)
"

Task 8.3: SQL Table Validation

Check All Tables Populated:

.\scripts\run_batch_analysis.ps1 -Tables

Expected Output:

ACM_Runs : 118 rows
ACM_Scores_Wide : 1611 rows
ACM_HealthTimeline : 1611 rows
ACM_Episodes : 24 rows
# ... (all 79 tables)

Validation Queries:

-- Check recent run data exists
SELECT TOP 10 * FROM ACM_Runs ORDER BY StartedAt DESC;

-- Verify scores written
SELECT COUNT(*) FROM ACM_Scores_Wide WHERE RunID = @LastRunID;

-- Check analytics tables
SELECT COUNT(*) FROM ACM_HealthTimeline WHERE EquipID IN (1, 2);
SELECT COUNT(*) FROM ACM_Episodes WHERE EquipID IN (1, 2);
SELECT COUNT(*) FROM ACM_SensorHotspots WHERE EquipID IN (1, 2);

Checklist:

Task 8.4: Performance Benchmarking

Baseline (Before Refactor):

# Record timing from previous run
time python -m core.acm_main --equip FD_FAN
# Example: 45 seconds

After Refactor:

# Measure new timing
time python -m core.acm_main --equip FD_FAN
# Target: 30-38 seconds (15-30% improvement)

Metrics to Compare:

Expected Improvements:

Task 8.5: Edge Case Testing

Test Empty DataFrames

# Verify empty DF handling
om.write_dataframe(pd.DataFrame(), Path("empty"), sql_table="ACM_Scores_Wide")

Test Missing Columns

# Verify auto-repair works
df = pd.DataFrame({"timestamp": [pd.Timestamp.now()]})
om.write_dataframe(df, Path("test"), sql_table="ACM_HealthTimeline")

Test Large Batches

# Verify batching/backpressure
large_df = pd.DataFrame({"val": range(100000)})
om.write_dataframe(large_df, Path("large"), sql_table="ACM_Scores_Wide")

Test Concurrent Writes

# Verify thread safety
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(om.write_dataframe, df, Path(f"t{i}"), 
               sql_table="ACM_HealthTimeline") for i in range(10)]

Validation Checklist - Phase 8:


Phase 9: Code Quality & Cleanup

Priority: P2 (Medium)
Time Estimate: 1 hour
Risk Level: None (quality assurance)

Task 9.1: Linting with Ruff

Run Linter:

ruff check core/output_manager.py

Common Issues to Fix:

Fix Command:

ruff check --fix core/output_manager.py

Expected: 0 errors after fixes

Task 9.2: Type Checking with MyPy

Run Type Checker:

mypy core/output_manager.py --strict

Common Issues:

Fix Incrementally:

# Start with less strict
mypy core/output_manager.py

# Then increase strictness
mypy core/output_manager.py --disallow-untyped-defs

Task 9.3: Dead Code Detection

Using Vulture:

vulture core/output_manager.py

Expected Findings:

Using IDE:

Checklist:

Task 9.4: Final Code Review

Automated Checks:

# Run all quality checks in sequence
ruff check core/output_manager.py && \
mypy core/output_manager.py && \
pylint core/output_manager.py --disable=C0301,C0103 && \
echo "All checks passed!"

Manual Review Checklist:

File Metrics:

# Count lines
wc -l core/output_manager.py
# Expected: ~3,700-4,000 lines (down from 5,520)

# Count methods
grep -c "^    def " core/output_manager.py
# Expected: ~80-90 methods (down from 100+)

# Count classes
grep -c "^class " core/output_manager.py
# Expected: 3-4 classes

Task 9.5: Git Diff Review

Check Changes:

git diff main..refactor/output-manager-bloat-removal core/output_manager.py | wc -l
# Expected: ~2000-3000 diff lines

Review Categories:

Sanity Checks:

Validation Checklist - Phase 9:


Phase 10: Documentation & Rollout

Priority: P1 (High)
Time Estimate: 1-2 hours
Risk Level: Low (final steps)

Task 10.1: Update CHANGELOG.md

Add Entry:

## [v8.1.0] - 2025-11-30

### Major Refactoring: Output Manager Bloat Removal

#### Removed
- **Chart generation infrastructure** (~940 lines)
  - Removed `generate_default_charts()` method and all 16 chart types
  - Removed matplotlib dependencies
  - Charts deprecated in favor of SQL-based dashboards (Grafana)
  
- **CSV file writing** (~80 lines)
  - Removed `batch_write_csvs()` and `_write_csv_optimized()` methods
  - Removed CSV batch infrastructure from OutputBatch dataclass
  - File-based output deprecated for SQL-only operation
  
- **CSV data loading** (~200 lines)
  - Removed `_read_csv_with_peek()` helper
  - Simplified `load_data()` to SQL-only mode
  - Cold-start CSV splitting logic removed
  
- **Helper methods** (~90 lines)
  - Removed `_generate_schema_descriptor()` (file-based schemas)
  - Removed `_generate_episode_severity_mapping()` (JSON output)
  - Cleaned up file path references
  
- **Redundant conditionals** (~24 lines)
  - Removed sql_only_mode checks from deprecated methods
  - Simplified batch flush logic

**Total Reduction:** 1,334 lines removed (24% of original file)

#### Changed
- **OutputManager now SQL-only by default**
  - `sql_only_mode=True` is now the default parameter
  - File-based operations raise deprecation warnings
  - All persistent storage uses SQL Server tables

#### Performance Improvements
- **15-30% faster batch processing** - Eliminated file I/O overhead
- **20-40% less memory usage** - No matplotlib chart buffers
- **Faster imports** - matplotlib no longer required
- **Cleaner logs** - Removed verbose file operation messages

#### Migration Guide
- Chart generation: Use Grafana dashboards querying SQL tables
- CSV exports: Query SQL tables directly for data exports
- File-based config: All configuration in SQL (ACM_Config table)
- Artifact cache: Use `get_cached_table()` for inter-module data exchange

#### Breaking Changes
- `generate_default_charts()` now returns empty list with warning
- `write_json()` and `write_jsonl()` deprecated (return immediately)
- `load_data()` requires `sql_mode=True` and equipment_name
- CSV file mode is no longer supported

#### Backward Compatibility
- Method signatures unchanged (parameters deprecated but not removed)
- Artifact cache API unchanged (FCST-15 compatibility maintained)
- SQL table writes unchanged (all 79 tables still supported)

Task 10.2: Update PROJECT_STRUCTURE.md

Add Section:

## Core Module: output_manager.py

**Purpose:** Unified SQL-based output management and analytics generation

**Architecture:** SQL-only mode (file-based operations deprecated as of v8.1.0)

**Key Components:**
- `OutputManager` class - Main orchestrator for SQL writes
- `OutputBatch` dataclass - Batch operation tracking
- Analytics table generators - 35+ specialized SQL table writers
- Artifact cache - In-memory DataFrame exchange (FCST-15)

**Dependencies:**
- `sql_client.SQLClient` - Database connection
- `utils.timestamp_utils` - Timestamp normalization
- `utils.logger` - Structured logging

**Outputs:**
- 79 SQL tables in ACM database (see ALLOWED_TABLES constant)
- Artifact cache for inter-module data exchange
- No file system dependencies

**Performance:**
- Batched SQL writes with backpressure control
- Connection pooling and health monitoring
- Intelligent column mapping and auto-repair
- Thread-safe concurrent operations

**Recent Changes (v8.1.0):**
- Removed 1,334 lines of CSV/chart bloat (24% reduction)
- 15-30% performance improvement in batch processing
- SQL-only mode now enforced by default

Task 10.3: Update README.md

Update Development Workflow Section:

### Development Workflow

#### Running ACM Pipeline

**Standard Mode (SQL-only):**
```bash
python -m core.acm_main --equip FD_FAN

Batch Processing (Historical Data):

python scripts/sql_batch_runner.py --equip FD_FAN --tick-minutes 1440

Coldstart Mode:

python -m core.acm_main --equip TEST_EQUIP --force-coldstart

Output & Results

All outputs are written to SQL tables:

Viewing Results:

Note: Chart generation and CSV export have been deprecated as of v8.1.0. Use Grafana for visualizations and SQL queries for data export.


### Task 10.4: Create Migration Guide

**New File:** `docs/MIGRATION_V8.1.md`

```markdown
# Migration Guide: v8.0 → v8.1 (Output Manager Refactoring)

## Overview

Version 8.1 removes file-based output operations (CSV, charts) from OutputManager 
for performance and operational simplicity. All data is now persisted to SQL tables.

## Breaking Changes

### 1. Chart Generation Removed

**Before (v8.0):**
```python
chart_files = output_manager.generate_default_charts(
    scores_df=scores,
    episodes_df=episodes,
    cfg=config,
    charts_dir=run_dir / "charts"
)

After (v8.1):

# Charts deprecated - use Grafana dashboards
# Method returns empty list with warning
# Remove chart generation calls from your code

Migration: Create Grafana dashboards querying SQL tables:

2. CSV File Writing Removed

Before (v8.0):

output_manager.write_dataframe(df, Path("output/scores.csv"))

After (v8.1):

# Always specify sql_table parameter
output_manager.write_dataframe(
    df, 
    Path("scores"),  # Cache key only, no file written
    sql_table="ACM_Scores_Wide"
)

3. CSV Data Loading Removed

Before (v8.0):

train, score, meta = output_manager.load_data(
    cfg=config,
    sql_mode=False  # CSV mode
)

After (v8.1):

# sql_mode=True now required
train, score, meta = output_manager.load_data(
    cfg=config,
    start_utc=start_time,
    end_utc=end_time,
    equipment_name="FD_FAN",
    sql_mode=True
)

4. JSON File Writes Deprecated

Before (v8.0):

output_manager.write_json(metadata, Path("metadata.json"))

After (v8.1):

# Use SQL tables for metadata
# ACM_RunMetadata, ACM_Config, etc.
sql_client.write_metadata(run_id, metadata)

Non-Breaking Changes

Artifact Cache (Unchanged)

The artifact cache API remains unchanged:

# Still works exactly the same
scores = output_manager.get_cached_table("scores_wide")

SQL Table Writes (Unchanged)

All SQL write methods unchanged:

# Still works
output_manager.write_dataframe(df, Path("key"), sql_table="ACM_Episodes")
output_manager.generate_all_analytics_tables(...)

Configuration Updates

Update: configs/config_table.csv

# Set these to false or remove
*,output,enable_charts,false,boolean,Charts deprecated (v8.1)
*,output,enable_csv,false,boolean,CSV deprecated (v8.1)

Testing Your Migration

# 1. Update your code
# 2. Run unit tests
pytest tests/ -v

# 3. Run integration test
python -m core.acm_main --equip TEST_EQUIP

# 4. Verify SQL tables populated
.\scripts\run_batch_analysis.ps1 -Tables

Rollback Plan

If you encounter issues:

# Revert to v8.0
git checkout v8.0-pre-refactor

# Or restore backup
cp core/output_manager.py.backup core/output_manager.py

Support


### Task 10.5: Update Inline Documentation

**Add Module-Level Notes:**
```python
# ==================== VERSION NOTES ====================
# v8.1.0 (2025-11-30): Major refactoring
# - Removed chart generation (~940 lines)
# - Removed CSV writing (~80 lines)
# - Removed CSV loading (~200 lines)
# - Removed helper methods (~90 lines)
# - Simplified conditionals (~24 lines)
# Total: 1,334 lines removed (24% reduction)
#
# Performance: 15-30% faster, 20-40% less memory
# Architecture: SQL-only mode enforced
# See: CHANGELOG.md, MIGRATION_V8.1.md
# ======================================================

Validation Checklist - Phase 10:


Final Acceptance Checklist

Functional Requirements ✅

Performance Requirements ✅

Code Quality Requirements ✅

Documentation Requirements ✅

Testing Requirements ✅

Rollback Preparedness ✅


Rollback Plan

If Critical Issues Discovered

Immediate Rollback (< 5 minutes):

# Option 1: Revert branch
git checkout main
git pull

# Option 2: Restore backup
cp core/output_manager.py.backup core/output_manager.py
python -m py_compile core/output_manager.py  # Verify syntax

Targeted Fix (< 30 minutes):

# Create hotfix branch
git checkout -b hotfix/output-manager-issue-X

# Fix specific issue
# ... make changes ...

# Test fix
pytest tests/test_output_manager.py -v
python -m core.acm_main --equip TEST_EQUIP

# Merge hotfix
git checkout refactor/output-manager-bloat-removal
git merge hotfix/output-manager-issue-X

Communication Plan

If Rollback Required:

  1. Notify team immediately
  2. Document specific failure mode
  3. Create incident report
  4. Schedule retrospective
  5. Plan phased rollout for next attempt

Success Metrics

Quantitative Metrics

Qualitative Metrics


Post-Merge Actions

Immediate (Day 1)

Short-term (Week 1)

Long-term (Month 1)


Implementation Timeline

Phase Time Dependencies Blocker Risk
Phase 1: Charts 3-4h None Low
Phase 2: CSV Write 2-3h Phase 1 Low
Phase 3: CSV Load 2-3h Phase 2 Medium
Phase 4: Conditionals 1h Phase 3 Low
Phase 5: Helpers 1-2h Phase 4 Low
Phase 6: Docs 1h Phase 5 None
Phase 7: Callers 2-3h Phase 6 Medium
Phase 8: Testing 2-3h Phase 7 High
Phase 9: Quality 1h Phase 8 Low
Phase 10: Rollout 1-2h Phase 9 Low
Total 16-23h Sequential -

Recommended Schedule: 2-3 days with testing buffer


Approval & Sign-Off

Ready for merge to main: _______________
Date: _______________
Signed: _______________


Document Version: 1.0
Last Updated: November 30, 2025
Author: ACM Development Team
Status: Implementation In Progress